Public Web Data Intake System Implementation

upwork.com 🟡 2026-06-01

🔹 Public Web Data Intake System Implementation
👤 Client: 🇺🇸 United States Member since 2026-03-25
💰 Price: ****
🚩 Problem: Need for maintainable, production-ready scrapers to populate an existing research pipeline from diverse public web sources.
📦 Existing: Core repository, database schema, stub source adapters, and ingestion flow (Python + Supabase).

Specifications:

[Target]: Public directories, vendor pages, documentation, blogs, news, public databases, APIs, RSS, paginated search pages
[Method]: Incremental ingestion, deduplication via content hashing, pagination handling, rate limiting, retries
[Stack]: Python, Supabase/Postgres, requests, httpx, BeautifulSoup, trafilatura, scrapy, playwright
[Format]: Structured records containing source URL, title, raw text, metadata, timestamps, and content hashes
[Security]: Lawful collection of public data; no authentication bypass, paywalls, or CAPTCHA solving

Workflow:

1. Review existing repository and source-adapter interface
2. Implement production-quality source collector
3. Integrate data storage into Supabase
4. Validate deduplication on repeat execution

⚡ Receive notifications instantly Join our community.

Discord Telegram

Our Social Networks

LinkedIn Twitter Facebook

🕷️️ Job Radar • SCRAPING